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Gramene is a well-established resource for plant comparative genome analysis. Data are generated through automated 
and curated analyses and made available through web interfaces such as GrameneMart. The Gramene project was an early 
adopter of the BioMart software, which remains an integral and well-used component of the Gramene website. BioMart 
accessible data sets include plant gene annotations, plant variation catalogues, genetic markers, physical mapping entities, 
public DNA/mRNA sequences of various types and curated quantitative trait loci for various species. 

Database URL: http://www.gramene.org/biomart/martview 



Project description 

The Gramene project (http://www.gramene.org) was 
launched in March 2001 as a curated, open-source. Web- 
accessible data resource for comparative genome analysis 
(1). Gramene's purpose is to provide added value to data 
sets available within the public sector, which facilitate 
researchers' ability to understand plant genomes and take 
advantage of genomic sequence known in one species for 
identifying and understanding corresponding genes, path- 
ways and phenotypes in other grass species. This is achieved 
by building automated and curated relationships between 
species that can be queried and displayed using web-based 
interfaces such as GrameneMart. 

In the 10 years since its release, the Gramene database 
has increased in scope and scale. The 34th build released in 
October 2011 hosts 49M genetic markers and associated 
DNA sequences from hundreds of plant species, and 
22 assembled plant genomes (14 completed and 8 partial). 
Figure 1 shows the growth of the Gramene database in 
numbers of complete and partially sequenced species rep- 
resented in Gramene's genomes module. 

Alongside code developed at Gramene, the project has 
since January 2002 used code developed by the EnsembI 



project (2) for the genome browser. Gramene's first release 
including BioMart software (3) was version 18 in July 2005, 
which used the data transformation tools provided by 
EnsembI. 

With the launch of the EnsembI Genomes project in 2009 
(4), Gramene has entered into a close collaboration over 
the generation of plant EnsembI databases, including the 
sharing of Plants Genes and Plants Variations BioMarts; 
these databases, and the interfaces used to query them, 
are mirrored at both Gramene and EnsembI Genomes web- 
sites. The first coordinated release was Gramene v30, 
EnsembI Genomes release 3 in October 2009. 

For the October 2011 release of Gramene, v34, 
GrameneMart was built with BioMart version 0.7 software, 
and the Gramene website was using BioMart version 0.7 
software. 

Query examples 

The MartView web interface for GrameneMart is found at 
http://www.biomart.org/biomart/martview, with access 
also available from the central BioMart portal http://www 
.biomart.org/biomart/martview. Mirrors of the Plants Genes 
Mart and Plants Variations Mart are available through 
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Figure 1. Growth in the number of species represented in 
Gramene's genomes module. 



EnsembI Genomes; http://plants.ensembl.org/biomart/ 
martview. 

To demonstrate various features of the GrameneMart 
database we present various example queries in Table 1. 

Data content 

There are five BioMart databases in Gramene; Plants Genes 
Mart, Plants Variations Mart, Gramene Markers, Gramene 
Mappings and Gramene QTL. Of these, the two Plants 
Marts are developed with and mirrored by EnsembI 
Genomes, whereas the three others, Gramene Markers, 
Gramene Mappings and Gramene QTL, are unique to 
Gramene. Each database is described below. 

Plant Gene Mart database 

As of October 201 1, Gramene's version of the Plants Genes 
Mart database holds a data set for each of the 14 fully 
sequenced plant genomes represented in Gramene. One 
useful feature of the Plant Genes database is the ability 
to map gene identifiers of one type to those of another 
using the extensive set of gene-anchored cross-references. 
Cross-reference sources from the Plant Gene Mart include 
EMBL (5), EntrezGene (6), IPI (7), PDB (8), RefSeq (9), 
UniProt (10) and UniGene (11), PlantGDB transcript assem- 
blies (12) and identifiers from The Gene Index (13). There 
are also a number of species-specific identifiers, including 
BGI-RIS (14), the Rice Genome Annotation Project (15) and 
RAP-DB (16) gene identifiers for rice; IGGP (17) gene iden- 
tifiers for grape; JGI gene identifiers for Arabidopsis lyrata 
(18), sorghum (19) and poplar (20); and TAIR (21) identifiers 
for Arabidopsis tha liana. 

Cross-references to other Gramene modules, such as 
Genes, Markers/Sequences (see below) and Pathways (1) 
are also represented and are used to make other gene 



assignments to EC Number and terms from various ontolo- 
gies including the Plant Ontology and Gene Ontology (22). 

The Plant Gene Mart allows users to filter by, and export 
attributes related to, genomic region, gene biotype, data- 
base cross references, ontology terms, orthology relation- 
ships with genes in other species or paralogy with genes in 
the same species, protein domain annotation and, where 
available, consequences of any underlying genomic vari- 
ants. An example query against the Plant Gene Mart data- 
base is included as 'Query 1' in Table 1. 

Plant Variation Mart database 

The Plant Variation Mart database holds a catalogue of 
DNA variants including single nucleotide polymorphisms 
(SNPs) and insertions/deletions (indels) for A. thaliana, 
rice (Japonica Group), rice (Indica Group) and grape. 
The A tlialiana data set contains over 8700 000 variants 
compiled from a number of studies (23,24), representing 
SNP discovery and genotyping across over 1000 
Arabidopsis accessions. In addition to the variants, their 
genomic locations and corresponding genetic conse- 
quences, a number of trait association data (23) are also 
exposed. The rice data sets, both Indica and Japonica, con- 
tain approx. 5 500 000 variants, predominantly from dbSNP 
(25), but also 150 000 SNPs discovered in 20 accessions by 
OryzaSNP (26) and data from a 1536 SNP panel genotyped 
across 395 accessions (27). The grape variation data set con- 
tains 460 000 SNPs discovered by next-generation sequen- 
cing of 18 grape cultivars (28). 

The Plant Variation Mart allows users to filter by, and 
export attributes related to, genomic region, variation ID, 
phenotype association, variation set/study, strain/accession, 
gene association and consequence. An example query 
against the Plant Variation Mart database is included as 
'Query 2' in Table 1. 

Gramene Markers database 

This database contains all of the genetic marker and asso- 
ciated DNA/mRNA sequence records represented in 
Gramene, which number 49M entries as of October 2011. 
Unlike the Gene and Variation Mart databases that aim to 
be comprehensive catalogues for their respective genomes, 
the Markers database represents a wide range of entities 
from many species that have been made available in the 
public domain, e.g. through databases such as GenBank. 

Entities in the Gramene Markers database are classified 
by type, with a data set created for each type. The different 
types and the numbers of each are shown in Table 2. 
Markers are connected to each other via correspondences, 
e.g. individual Expressed Sequence Tags correspond to EST 
Clusters of which they are members, and these associations 
are represented in the database and exposed as filters/at- 
tributes. The database also represents groupings of markers 
within a type into libraries. 
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Table 2. Marker types and number of records of each repre- 
sented as data sets in the Gramene Markers biomart database 



Marker type/BioMart data set 


Number 




of records 


Amplified Fragment Length Polymorphism, AFLP 


8150 


Breakpoint interval 


303 


Centromere 


57 


Clone 


2 242 577 


Deletion 


333 


EST cluster 


6154 296 


Expressed sequence Tag, EST 


20 690805 


Fluorescent In Situ Hybridization, FISH, Probes. 


37 


Fingerprint Contig, FPC 


17 479 


Genome Survey Sequence, GSS 


10 653 993 


Gene prediction 


354 564 


Gene 


10781 


Genomic DNA 


5 263129 


Insertion Site-Based Polymorphism, ISBP 


691 


Insertion 


310 


Microarray probe 


260 656 


mRNA 


651 207 


Overgo hybridization probe 


24464 


Oligonucleotide 


2 396466 


Point 


332 


Primer 


80 555 


Probed site 


11 532 


Quantitative Trait Locus, QTL 


11 625 


Random Amplification of Polymorphic DNA, RAPD 


175 


Restriction fragment length polymorphism 


18761 


Simple Sequence Repeat, SSR 


24422 


Sequence Tagged Site, STS 


3437 


Telomere 


20 



The Grannene Markers database allows users to filter by, 
and export attributes related to species, gernnplasnn, nanne/ 
synonynn, library/source and related (corresponding) enti- 
ties. An exannple query against the Grannene Markers data- 
base is included as 'Query 3' in Table 1. 

Gramene Mappings database 

The Gramene Mappings database nnodels the nnappings 
between nnarkers (described above) and nnolecular nnaps. 
The various nnap types, bin, cytogenetic, deletion, genetic, 
physical, quantitative trait loci (QTL), sequence, are each 
nnodelled as a separate data set. 

The Grannene Mappings database allows users to filter 
by, and export attributes related to species, nnap set/nnap 
(e.g. chronnosonne), nnap position, nnarker nanne, nnarker 



type and analysis. An example query against the Gramene 
Mappings database is included as 'Query 4' in Table 1. 

Gramene QTL database 

The QTL database contains details of all QTL in Gramene; 
currently 1 1 624 from 10 species. The focus of this database 
is querying and reporting QTL by Trait Ontology (22) term. 
An example query against the Gramene QTL database is 
included as 'Query 5' in Table 1. 

Discussion and future directions 

Gramene has been a long-standing user of the BioMart 
software. We have deployed both the EnsembI data trans- 
formation and interface configuration, and also developed 
Gramene-specific transformations from our MySQL-based 
data resources on custom schemas for which we made 
extensive use of the BioMart MartBuilder and MartEditor 
software. GrameneMart has become an integral and 
well-used component of the Gramene website. We will 
adopt software updates from both EnsembI and BioMart 
projects as appropriate. 

Gramene's future BioMart efforts will be focused on 
data federation, both internally between Gramene data- 
bases, and externally with third-party data sets. We antici- 
pate that this will significantly increase the utility of 
Gramene's extensive ontology and phenotypic linkage 
data. 
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